-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full refactorization of modelling and visualisation functions #200
Conversation
…cation of priors
…odel' This function allows to extact an specific loo estimate for model parameters like 'seroreversion_rate' or 'foi'. This introduce changes in: - 'summarise_seromodel' - 'plot_summary' - 'plot_seromodel'
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
Add examples for: - `fit_seromodel` - `get_foi_index` - `plot_serosurvey`
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
DESCRIPTION
Outdated
@@ -1,7 +1,7 @@ | |||
Package: serofoi | |||
Type: Package | |||
Title: Estimates the Force-of-Infection of a given pathogen from population based seroprevalence studies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
population-based
build_stan_data <- function( | ||
serosurvey, | ||
model_type = "constant", | ||
foi_prior = sf_uniform(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the config.yml file and it did seem to have a max default of 10 (which I think is fine); I just wanted to double-check that I read it correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. The default uniform distribution is sf_uniform(0, 10)
and for the normal distribution it is sf_normal(0, 1)
.
Hi @ntorresd, This is so great. Thank you so so much for this work -- it lays the foundation for a whole lot more work for us all and does so in a really elegant way. (Well done team! Here's to Bogota 2025...) A few thoughts. These are generally minor and could be discussed and handled in separate PRs if this isn't the right forum:
|
Hi @ben18785, thanks a lot for your feedback! I'll address some of your points in this PR and open issues to address the others, as there are some closely related to user's feedback collected during the previous user test:
Right now, this variable is named
I'll implement this one before merging for the sake of names consistency.
I've opened #205 to address this.
I think we should address this one in #204 , opened by one of our users @JDConejeros during the user test; it's a good chance to give a compelling explanation of this concept.
I added a comment in #202 about this, so we can discuss this further over there.
Yes, this is a must. I opened #206 to address this.
I'd rather keep for the time being. During the user test it was useful to have it exported.
I haven't run it yet, but it should be very low as the only part of the code covered by unit testing right now is the data simulation module. We've discussed unit testing with @jpavlich and we decided to start over as the structure of the packaged has changed so much. After merging this PR to dev, I'll open some issues about this. The idea is to have >90% coverage before submitting to CRAN.
Not a silly question at all. As mentioned before, we are not testing for any of this right now. We should implement this before merging to main, so this is something we will discuss soon enough. |
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This PR replace the old modelling and visualisation functions for refactored versions agreed on by the development team of the package (@zmcucunuba, @ben18785, @sumalibajaj, @jpavlich, @ekamau) earlier this year.
We designed this new functions for the selection and the specification of the Bayesian models to be more flexible than before, as well as including additional constant/time/age varying models with the possibility to estimate the seroreversion rate$\mu$ . In particular, the usage of
fit_seromodel()
went from (e.g.):to
Note that we refer to the serological survey now as
serosurvey
rather thanserodata
, and we restrict it to have the following data:tsur
: Year in which the survey took placeage_min
: Floor value of the average between age_min and age_maxage_max
: The size of the samplesample_size
: Number of samples for each age groupn_seropositive
: Number of positive samples for each age groupThe function
sf_normal
is one of a set of auxiliary functions designed to specify the prior distributions of the parameters to be estimated (FOI or seroreversion rate), as suggested in #193 . Currently available priors aresf_normal()
andsf_uniform()
, corresponding to Gaussian and uniform priors respectively.To illustrate the current pipeline, consider a disease with constant FOI$\lambda = 0.02$ and seroreversion rate $\mu=0.01$ :
and a serological survey with the following features:
We can use
simulate_serosurvey()
, introduced in #199, to simulate statistically consistent seropositive countsn_seropositive
for each age group, according to our set of serological models. In this case, I use the time-varying model:which we can visualise using
plot_serosurvey
:Implementing the constant model with and without seroreversion by means of
fit_seromodel()
:Visualizing the results:
Note that the model estimating the seroreversion rate (right panel in the image), no only is a better fit for this serological survey according to the elpd value, but it also accurately estimates both the FOI and seroreversion rate we used to simulate the serosurvey.
Another difference with previous versions lies in how we visualize the estimated parameters. Since the constant model is estimating a single FOI value, we only show the estimated value with its corresponding credible interval (we use to plot it instead). However, for the time and age varying models we estimate several values of the FOI in the time/age span of the survey, which we visualize graphically. Take for instance the time varying model with seroreversion:
Here we plot the estimated values of the FOI as a function of time (blue line and shadow) and the R-hat estimates for each estimated value (black dots). Note that, even though we ran the model for a large number of iterations - 4000, the model did not converged (since there are R-hat values over 1.01). We can improve the convergence of the model by estimating less values of the FOI. To specify the time intervals for which FOI values will be estimated, we index them by means of
get_foi_index()
:This function returns a list of indexes that labels each year (or age, for age-varying models). The idea is that a single FOI value will be estimated for each index, meaning that 10 FOI values will be estimated in this particular case:
Now the model converges. As long as
foi_index
length covers the whole time/age span of the serological survey, less regular indexationsfoi_index
can be specified.